Proc. 5th Workshop on Highly Parallel Processing on a Chip
نویسندگان
چکیده
We present a framework for representing image processing kernelsbased on decoupled access/execute metadata, which allow the programmer tospecify both execution constraints and memory access pattern of a kernel. Theframework performs source-to-source translation of kernels expressed in high-level framework-specific C++ classes into low-level CUDA or OpenCL codewith effective device-dependent optimizations such as global memory paddingfor memory coalescing and optimal memory bandwidth utilization. We evaluatethe framework on several image filters, comparing generated code against highly-optimized CPU and GPU versions in the popular OpenCV library.
منابع مشابه
Design and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. 
The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of fu...
متن کاملMessage from the ICPP Workshop Co-chairs
Workshop on Web Services-based Grid Applications (WSGA), Monday, August 14th, morning Workshop on Parallel and Distributed Multimedia (PDM), Monday, August 14th, morning Workshop on Wireless and Sensor Networks (WSNet), Monday, August 14th, all day 3rd International Workshop on Embedded Computing (EC-06), Monday, August 14th, all day Workshop on Performance Evaluation of Networks for Parallel, ...
متن کاملParallel embedded processor architecture for FPGA-based image processing using parallel software skeletons
Today, the problem of designing suitable multiprocessor architecture tailored for a target application field raises the need for a fast and efficient multiprocessor system-on-chip (MPSoC) design environment. Additionally, the implementation of image processing applications on MPSoC system will need to exploit the parallelism and the pipelining in algorithms with the hope of delivering significa...
متن کاملDesign and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of func...
متن کاملHPPC'09 Workshop Proceedings
Efficient NoC is crucial for communication among processing elements in a highly parallel processing systems on chip. Mapping cores to slots in a NoC platform and designing efficient routing algorithms are two key problems in NoC design. Source routing offers major advantages over distributed routing especially for regular topology NoC platforms. But it suffers from a serious drawback of overhe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011